Analysis of stop-and-frisk + camera locations in NYC

0. Data preparation and inspection

SOURCES

PREPARED DATASETS

DEMOGRAPHICS

STOP+FRISK

SURVEILLANCE

SURVEILLANCE / POPULATION

1. The number of stop+frisk incidents is closely linked to the level of surveillance

We first analyze how the number of stop+frisk incidents depends on the number of cameras. The underlying statistical model we'll use is a generalized linear model, $$ \text{average num.stops in tract} = \lambda \times \text{tract.popn}/1000, $$ Here $\lambda$ is the rate of stop+frisk incidents per 1000 population, and the focus of the analysis is to understand how λ depends on level of surveillance.

Furthermore, we'll model the actual number of stops as a Poisson random variable, with mean as specified above. This is a standard statistical model for analyzing count data.

We split the tracts into 9 groups, according to level of surveillance, and estimate λ separately for each group. (This allows us to assess the relationship between stop+frisk and surveillance, without assuming any particular form of the equation.) For this analysis, surveillance level is defined as the effective number of cameras within 200m of a given census tract, per 1000 residents. We see that the stop+frisk rate λ increases with the level of surveillance, and the relationship is roughly linear.

Our analysis uses data for 2019. We restrict attention to census tracts with a population of >250, as a simple way to exclude parks etc.

SANITY CHECKS

Here are some plots that support the underlying statistical model described above.

The first plot shows that the stop+frisk rate (number of stops per 1000 residents) grows with the surveillance level. This plot is very noisy.

The second and third plots show why there is so much noise. The actual number of stops in a given census tract is a small integer, mostly in the range 0-10, and so there is bound to be lots of noise in the data for a single census tract. The second plot supports the idea that the number of stop+frisk incidents is proportional to population, and the third plot is consistent with a Poisson model.

What else does the stop+frisk rate depend on? As before we consider the model $$ \text{average num.stops in tract} = \lambda \times \text{tract.popn}/1000 $$ and we investigate what λ depends on. Our baseline model is a simple generalized linear model, $$ \log \lambda = \alpha + \beta\times \text{surveillance.level} + \gamma\times \text{nonwhite.fraction} $$ where α, β, γ, are coefficients to be estimated from the data.

Unsurprisingly, the coefficient for surveillance level (defined as effective number of cameras within 200m of the tract per 1000 residents) is positive, and highly significant (coef=0.02, p<0.001, for Queens in 2019).

The coefficient for nonwhite.fraction (defined as the fraction of residents who identify as Black or Hispanic, out of those who identify as Black or Hispanic or White) is also positive, and highly significant (coef=0.83, p<0.001, for Queens in 2019).

The β and γ coefficients vary from borough to borough, and they are consistent from 2019 to 2020. (See the chart below for the coefficient values and 95% confidence intervals.) They are consistently positive, and consistently significant.

The fact that both β and γ are highly significant shows that they are not confounding each other. In other words, it is not the case that variation in stop+frisk due to surveillance level is entirely explained by the racial mix.

SANITY CHECKS

We developed the baseline model using data from a single borough (Queens), to avoid overfitting.

3. How does surveillance depend on demographics etc.?

We have seen that stop+frisk rates depend separately on surveillance level and on the proportion of nonwhite residents. We now investigate what surveillance level depends on.

For these analyses we'll measure surveillance level by effective number of cameras per 1000 residents in a census tract. As before we assign each camera a radius of 120m, and we measure the total area visible, then divide by the area visible by a single camera. In this section we're analysing the attributes of each area of the city, so we'll measure the area surveilled within each census tract (eff_cameras/popn). This in contrast to the earlier analyses of stop+frisk counts, where we analysed the attributes of residents, and we measured the area surveilled within a neighbourhood of the census tract (eff_cameras_within_200m/popn).

When we take accout of poverty (. ~ . + borough:med.income), the findings point in the same direction, though they are less significant. This suggests there is some degree of confounding, since there is more poverty linked with greater proportion of nonwhite residents.

Manhattan is most likely a special case: it's a transport hub, so there are many non-resident occupants, and policing e.g. surveillance may well be linked to the number of occupants rather than residents.

MODEL CHOICE

The analyses are based on logistic regression. Surveillance level is most definitely non-Gaussian (it's truncated at zero -- see the histogram below), so it's not sound to fit a linear regression. Instead, we have binarized it into low versus high, with a threshold of 0.18 cameras per 1000 residents (close to the median). This is a simple way to get robust results.

Our baseline model is $$ \operatorname{logit} \operatorname{Prob}(\text{high}) = \alpha_{\text{borough}} + \beta_{\text{borough}} \times \text{nonwhite_fraction} $$ and we are interested in the β coefficients, one for each borough.

SANITY CHECKS AND DATA PLOTS

4. How else do stop-and-frisk actions depend on surveillance?

We have seen that the stop+frisk rate depends on surveillance level: the higher the surveillance level, the higher the rate. We now investigate this link in more granular detail.

Suspected crime description. Does the correlation between stop+frisk rate and surveillance level depend on the suspected crime description? Yes it does: there are some suspected crimes, especially ASSAULT, CPW, ROBBERY, LARCENY, where the stop+frisk rate is highly correlated with surveillance level (making up 71% of incidents). For other suspected crimes, there is no correlation.

Chance of being found innocent. Might it be that in areas with high surveillance, the police do more uncalled-for stop+frisks, and hence there are more innocent people stopped? No. There is no correlation between the chance of being found innocent and the surveillance level.

"Stopped while Black." We'd expect that the stop+frisk rate should depend on the racial mix: in areas with a higher proportion of Black residents, a higher proportion of stop+frisk incidents are likely to be of Black people. Does this ratio vary according to surveillance level? No, not significantly. (The whopping great fact is that there are many more Black people stopped than other races. This is a property of the stop-and-frisk dataset, and it doesn't seem to be linked to camera surveillance, so it's outside the scope of this study of surveillance.)

4.1 Suspected crime description

4.2 Chance of being found innocent

4.3 "Stopped while Black"